A CRF-based approach to automatic disfluency detection in a French call-centre corpus

نویسندگان

  • Camille Dutrey
  • Chloé Clavel
  • Sophie Rosset
  • Ioana Vasilescu
  • Martine Adda-Decker
چکیده

In this paper, we present a Conditional Random Field based approach for automatic detection of edit disfluencies in a conversational telephone corpus in French. We define disfluency patterns using both linguistic and acoustic features to perform disfluency detection. Two related tasks are considered : the first task aims at detecting the disfluent speech portion proper or reparandum, i.e. the portion to be removed if we want to improve the readability of transcribed data ; in the second task, we aim at identifying also the corrected portion or repair which can be useful in follow-up discourse and dialogue analyses or in opinion mining. For these two tasks, we present comparative results as a function of the involved type of features (acoustic and/or linguistic). Generally speaking, best results are obtained by CRF models combining both acoustic and linguistic features. Index Terms : disfluencies, conditional random fields, conversational speech, spontaneous speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An Evaluation on a Corpus of French Spontaneous and Read Speech

We present DisMo, a multi-level annotator for spoken language corpora that integrates part-of-speech tagging with basic disfluency detection and annotation, and multi-word unit recognition. DisMo is a hybrid system that uses a combination of lexical resources, rules, and statistical models based on Conditional Random Fields (CRF). In this paper, we present the first public version of DisMo for ...

متن کامل

A comparison between a DNN and a CRF disfluency detection and reconstruction system

We propose to compare between a Deep Neural Network and a Conditional Random Field disfluency detection and reconstruction system, both trained on the same features. Deep Neural Networks, despite an increasing popularity in a multitude of speech and language related tasks, were never applied to disfluency recognition. One of the most difficult classes of disfluency is false starts. We are inter...

متن کامل

Comparing HMM, maximum entropy, and conditional random fields for disfluency detection

Automatic detection of disfluencies in spoken language is important for making speech recognition output more readable, and for aiding downstream language processing modules. We compare a generative hidden Markov model (HMM)-based approach and two conditional models — a maximum entropy (Maxent) model and a conditional random field (CRF) — for detecting disfluencies in speech. The conditional mo...

متن کامل

Tight Integration of Speech Disfluency Removal into SMT

Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluen...

متن کامل

應用不定長度特徵之條件隨機域於口語不流暢語流修正 (Disfluency Correction of Spontaneous Speech using Conditional Random Fields with Variable Length Features) [In Chinese]

This paper presents an approach to detecting and correcting edit disfluency based on conditional random fields with variable-length features. The variable-length features consist of word, chunk and sentence features. Conditional random fields (CRF) are adopted to model the properties of the edit disfluency, including repair, repetition and restart, for edit disfluency detection. For the evaluat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014